System and method for removing data from processor caches in a distributed multi-processor computer system

ABSTRACT

A processor ( 600 ) in a distributed shared memory multi-processor computer system ( 10 ) may initiate a flush request to remove data from its cache. A processor interface ( 24 ) receives the flush request and performs a snoop operation to determine whether the data is maintained in a one of the local processors ( 601 ) and whether the data has been modified. If the data is maintained locally and it has been modified, the processor interface ( 24 ) initiates removal of the data from the cache of the identified processor ( 601 ). The identified processor ( 601 ) initiates a writeback to a memory directory interface unit ( 24 ) associated with a home memory  17  for the data in order to preserve the modification to the data. If the data is not maintained locally or has not been modified, the processor interface ( 24 ) forwards the flush request to the memory directory interface unit ( 22 ). Memory directory interface unit ( 22 ) determines which remote processors within the system ( 10 ) have a copy of the data and forwards the flush request only to those identified processors. The identified processors then remove the data from their respective caches in response to the flush request. If an identified remote processor has modified data, the identified remote processor initiates a writeback to the memory directory interface unit ( 22 ) for preservation of the modified data.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation and claims the prioritybenefit of U.S. patent application Ser. No. 14/141,326 filed Dec. 26,2013, which is a continuation and claims the priority benefit of U.S.patent application Ser. No. 09/909,700 filed Jul. 20, 2001, now U.S.Pat. No. 8,635,410, that issued on Jan. 21, 2014, which claims thepriority benefit of U.S. provisional application 60/219,951 filed Jul.20, 2000, the disclosures of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates in general to a multi-processor computersystem and more particularly to a system and method for removing datafrom processor caches in a distributed multi-processor computer system.

Background of the Invention

When building a distributed shared memory system based on multiple nodesof snoopy front side processor buses, it is useful to have a techniqueto guarantee that no processor has a copy of a cache line and that thememory has the only up to date copy. The standard mechanism provided bythe snoopy front side processor bus only can perform such an operationlocally as it has no knowledge of other buses in the system. Also, cacheflush techniques are typically sent to each processor whether or not theprocessor has a copy of the cache line to flush. Therefore, it isdesirable to handle cache flushes on a global level without occupyingthe system with unnecessary flush requests.

SUMMARY OF THE INVENTION

From the foregoing, it may be appreciated by those skilled in the artthat a need has arisen for a technique to remove data from caches in adistributed shared memory multi-processor computer system. In accordancewith the present invention, a system and method for removing data fromprocessor caches in a distributed multi-processor computer system areprovided that substantially eliminate or reduce disadvantages andproblems associated with conventional cache flushing techniques.

According to an embodiment of the present invention, there is provided amethod for removing data from processor caches in a distributedmulti-processor computer system that includes receiving a request toremove data. Upon receiving the request to remove data, a determinationis made as to whether the data is maintained locally in a cache of eachof a plurality of local processors and whether the data has beenmodified in response to the data being maintained locally. If the datais maintained locally and has been modified, the data is removed fromthe local cache. If the data is not maintained locally or has not beenmodified, the request to remove data is transferred to a memorydirectory associated with a home location for the data. The memorydirectory interface unit determines which caches in the computer systemmaintain a copy of the data and then forwards the request to remove datato each processor having a cache maintaining the data.

The present invention provides various technical advantages overconventional cache flushing techniques. For example, one technicaladvantage is to prevent a global flush request from being sent to allprocessors in the system. Another technical advantage is to send a flushrequest into the system only when necessary and only to processorsmaintaining data subject to the flush request. Other technicaladvantages may be readily apparent to those skilled in the art from thefollowing figures, description, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and theadvantages thereof, reference is now made to the following descriptiontaken in conjunction with the accompanying drawings, wherein likereference numerals represent like parts, in which:

FIG. 1 illustrates a block diagram of a distributed shared memorycomputer system;

FIG. 2 illustrates a block diagram of a node in the distributed sharedmemory computer system;

FIG. 3 illustrates a block diagram of the distributed shared memorycomputer system handling numerous writebacks initiated by a processor;

FIG. 4 illustrates a block diagram of distributed shared memory computersystem handling a transfer of cache line ownership;

FIG. 5 illustrates a block diagram of distributed shared memory computersystem handling concurrent snoop and read operations;

FIG. 6 illustrates a block diagram of the distributed shared memorysystem performing a cache flush operation.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a block diagram of a computer system 10. The computer system10 includes a plurality of node controllers 12 interconnected by anetwork 14. Each node controller 12 processes data and traffic bothinternally and with other node controllers 12 within the computer system10 over the network 14. Each node controller 12 may communicate with oneor more local processors 16, a local memory device 17, and a localinput/output device 18.

FIG. 2 is a block diagram of the node controller 12. The node controller12 includes a network interface unit 20, a memory directory interfaceunit 22, a front side bus processor interface unit 24, an input/outputinterface unit 26, a local block unit 28, and a crossbar unit 30. Thenetwork interface unit 20 may provide a communication link to thenetwork 14 in order to transfer data, messages, and other traffic toother node controllers 12 in computer system 10. The front side busprocessor interface unit 24 may provide a communication link with one ormore local processors 16. The memory directory interface unit 22 mayprovide a communication link with one or more local memory devices 17.The input/output interface unit 26 may provide a communication link withone or more local input/output devices 18. The local block unit 28 isdedicated to processing invalidation requests and handling programmedinput/output operations. The crossbar unit 30 arbitrates the transfer ofdata, messages, and other traffic for the node controller 12.

Each processor 16 includes at least one cache to temporarily store datafrom any memory 17 within system 10. Data is typically stored in a cacheof processor 16 as individual cache lines of 132 bytes each that include128 bytes of data and 4 bytes of directory information including itsstate and other control information pertaining to the data associatedwith the cache line. The directory information includes everything whichneeds to be known about the state of the cache line in the system as awhole and the data portion holds the data associated with the cache lineunless another part of the system has a current copy of the cache linebefore it has been updated in the memory. Memory directory interfaceunit 22 includes memory references to data stored within itscorresponding memory and what processors within system 10 have a copy ofthat data. Processor 16 may request data from any memory 17 withinsystem 10 through accesses to the memory directory interface unit 22corresponding to the memory containing the data. If the data is held inthe cache of another processor, the data may be retrieved from thatother processor according to a protocol scheme implemented within system10. Memory directory interface unit 22 responds to incoming messagesfrom anywhere within system 10 and updates the state of a particularcache line and generates messages in response to the incoming messages.

System 10 accesses memory resident data and system state and reliablyshares data between cooperating processor nodes and/or peer input/outputnodes through a protocol scheme. The protocol scheme is specifiedthrough four correlated attribute sets. The attribute sets are thetransient and stable sharing state associated with each parcel of dataas viewed at its home location, the transient and stable stateassociated with each remote copy of a parcel of data, the specificrequest and response message types used in communications betweenentities within system 10, and the action taken in response to thesemessages. Actions taken may include state transitions, bus transactions,and reply messages.

Four subset protocols may be included in the overall system protocolscheme. These protocols include a memory protocol for the coherent ornon-coherent access to main memory resident data, a programmedinput/output protocol for access to miscellaneous system state andcontrol mechanisms, a graphics flow control protocol for applyinglocalized flow control on a processor which is streaming writes to agraphics peripheral, and an administrative protocol for use inmaintenance and configuration procedures and for implementation specificfunctionality. The memory protocol requires no network ordering of anykind. Messages may be freely reordered even within a single virtualchannel between a single source and destination. The programmedinput/output protocol uses a hybrid network ordering technique. PIOrequest messages are delivered in order from a particular source to aparticular destination. This ordering is preserved even for PIO requestmessages to different addresses. Thus, all PIO request messages from asource node to a particular destination node are delivered in the sameorder in which they are sent regardless of whether the destination forthe message has the same or different address. PIO reply messagesrequire no network ordering as they may be delivered to the originatingnode in an order different from that in which they were sent by thetarget of the PIO request message. The graphics flow control protocoluses the same hybrid network ordering technique as the programmedinput/output protocol. Administrative messages require no networkreordering of any kind and may be freely reordered as in the memoryprotocol.

The protocol scheme is a non-blocking request/reply protocol techniquepreferably optimized for processor 16 front side bus and cache coherenceimplementation. The protocol scheme extends the Modified/ExclusiveShared/Invalid (MESI) cache coherence protocol, used to maintaincoherence within an individual processor bus, throughout system 10. Thetechnique maintains coherence related sharing state for each cache linesized parcel of physical data in a special directory structure. Thestate of remotely held copies of a cache line is maintained in a similarfashion at the remote locations using a cache to hold the current copyof the cache line, its address tag, and its current state.

Various features are provided by the protocol scheme. Messages thatcannot be serviced when they reach the memory are NACK'd rather thanstalled or buffered in order to provide the non-blocking functionality.Two virtual channels are used—one for request and one for replymessages. Messages may be arbitrarily reordered within system 10. Threehop forwarding of dirty data may be provided directly from the owner ofthe data to the requester as long as sufficient network resources areavailable. Each request message includes an echo field whose contentsare returned with every reply message associated with the originalrequest message. Dynamic backoff is supported to restrict therequest/reply protocol during network congestion. Implicit writebacksare handled and all forms of writebacks are acknowledged. Private dataoptimization is provided wherein lines may be requested read shared butexclusive is preferred if convenient. Non-allocating reads (getoperations) and out of the blue cache line writes (put operations) allowfor intra-cluster page migration and block copies and inter clustercommunications. Silent drops of clean exclusive (CEX) and shared (SHD)data in processor caches are provided as well as CEX replacement hints.Also, fairness and starvation management mechanisms operate inconjunction with the core protocol scheme to increase message servicefairness and prevent message starvation.

Other features include exclusive read-only request messages thatretrieve data in a read-only state but also removes it from all sharersin the system. This operation is preferably used for input/output agentprefetching as it permits any node in system 10 to receive a coherentcopy of a cache line. An input/output agent may also guarantee toself-invalidate an exclusive read-only line from its cache after acertain period of time through a timed input/output read in order toeliminate a need for the directory to send an invalidate request messageto the input/output agent. This feature optimizes the expectedinput/output prefetching behavior and adds additional RAS resiliency inthat a missing invalidate acknowledgment from an input/output agent canbe ignored once the timeout period has elapsed.

Directory state is maintained in separate directory entries for eachcache line in the main resident memory. Each entry contains a line staterepresenting a fundamental sharing state of the cache line, a sharingvector tracking which nodes and processors have a copy of the cache linein question, a priority field specifying the current priority of thedirectory entry for use in the fairness/starvation mechanism, and aprotection field determining what types of accesses are permitted andfrom which nodes.

In this embodiment, the directory tracks 29 different states for eachcache line. Fewer or more states may be tracked as desired for aparticular implementation. Table I provides an example of the differentstates. Of the states listed in Table I, there are four stable stateswith the remaining states being transient and used to track the progressof a multi-message transaction in which the directory receives a requestmessage, forwards some sort of intermediate message, and waits for aresponse message before completing the transaction and returning theparticular cache line to one of the four stable states.

(Table 1—continued on Page 9)

group Name Description Stable UNOWN Line is not cached anywhere; onlycopy States of the line is in memory. SHRD Line is cached in a read-onlystate by one or more nodes. All cached copies of the line are identicalto the one in memory. EXCL Line is cached in a read/write state byexactly one node. The cached copy of the line is more up to date thanthe copy in memory. SXRO Line is cached in a read-only state by a singlenode in the system. This state is the result of a read exclusiveread-only request. Transient BUSY sent intervention; rcvd nothing fromnew states owner, nothing from old for read to BSYEI sent intervention;rcvd IWE from new exclusive owner, nothing from old line BSYUW sentintervention; rcvd WRBKI/WRBKR from new owner, nothing from old BSYURsent intervention; rcvd RQSH/RQSHR from new owner, nothing from oldBSYEN sent intervention; rcvd first half of response from old owner; donot write further data from old owner. Eventual state is EXCL. BSYENsent intervention; rcvd first half of response from old owner; allowwrites of further data from old owner. Eventual state is EXCL. BSYSNsent intervention; rcvd first half of response from old owner; do notwrite further data from old owner. Eventual state is SHRD. BSYSY sentintervention; rcvd first half of response from old owner; allow writesof BSYUN sent intervention; rcvd first half of response from old owner;do not write further data from old owner. Eventual state is UNOWN. BSYUYsent intervention; rcvd first half of response from old owner; allowwrites of further data from old owner. Eventual state is UNOWN.Transient BSYF Sent FLSH/ERASE, nothing received yet states BSYENwaiting on second half of FLSH/BRASE after result, data received issuinga BSYFY Waiting on second half of FLSH/ERASE FLSH or result, no datareceived ERASE Transient BUSYI Tracking down an invalid copy for astates for GET GET to BSYIW Tracking down an invalid copy for aexclusive GET, have received a writeback from line the owner. TransientBSYG Sent ININF, nothing received yet states for BSYGN Waiting on secondhalf of ININF result, GET to data received exclusive BSYGY Waiting onsecond half of ININF result, line no data received Transient BSYX SentINEXC; nothing received yet. states for BSYEN Sent INEXC and waiting forsecond half timed read- of result; data received exclusive BSYXY SentINEXC and waiting for second half read-only of result; no data receivedrequests Transient BSYN Sent INEXC; nothing received yet. states forBSYNN Sent INEXC and waiting for second half non-timed of result; datareceived read- BSYNY Sent INEXC and waiting for second half exclusive ofresult; no data received read-only requests Miscellaneous POTS Line hasbeen marked as inaccessible. states Any attempt to read or write to theline will receive a PERK error response. This state can be entered onlyby a backdoor directory write by the OS.

Information in the sharing vector tracks the location of exclusive orshared copies of a cache line as required to enforce the protocol thatmaintains coherence between those copies and the home location of thecache line. The sharing vector may be used in one of three waysdepending on the directory state. The sharing vector may be in a pointerformat as a binary node pointer to a single processor node orinput/output node. This format is used when the state is EXCL as well asin most transient states. The sharing vector may be in a pointer timerformat as a combination of an input/output read timer and a binary nodepointer. This format handles the read exclusive read-only (RDXRO)transaction. The sharing vector may be in a bit vector format as a bitvector of sharers. The field is preferably partitioned into a plane bitvector, a row bit vector, and a column bit vector. This format is usedwhen the cache line is in a SHRD state. Examples of the use of thesharing vector can be found in copending U.S. application Ser. No.08/971,184, now U.S. Pat. No. 6,633,958, entitled “MultiprocessorComputer System and Method for Maintaining Cache Coherence utilizing aMulti-dimensional Cache Coherence Directory Structure” and in copendingU.S. Application entitled “Method and System for Efficient Use of aMulti-dimensional Sharing Vector in a Computer System”, both of whichare incorporated herein by reference.

Each directory entry includes priority field. Each incoming read requestmessage also includes a priority field. When the incoming requestmessage reaches the directory mechanism, its priority field is comparedto the priority field in the associated directory entry. If the priorityof the incoming request message is greater than or equal to that in thedirectory entry, the request message is allowed to be serviced normally.The result of servicing determines how the directory priority isupdated. If the request message was serviced successfully, then thepriority of the directory entry is reset to zero. If the request messagewas not serviced successfully, the priority of the directory entry isset to the priority of the request message. If the priority of theincoming request message is less than the priority of the directoryentry, then the request message is not permitted to be serviced. A NACKis returned and the priority of the directory entry is not altered.

The protection field in the directory entry is used to determine whetherrequest messages for a cache line are allowed to be serviced. Forprotection purposes, all nodes in the system are classified as local orremote. Local/remote determination is made by using a source node numberin the request message to index a local/remote vector stored in thememory directory. If the bit in the local/remote vector corresponding tothe source node number is set, the access is classified as local. If thebit is cleared, the access is classified as remote. Once local/remoteclassification has been made, the protection bits in the protectionfield in the directory entry determine if the access is allowed. Toimplement the protection scheme, all request messages are classified asreads or writes. Any read request message to a cache line for which therequester does not have at least read-only permission will be returnedas an access error reply and no directory state updates of any kind willoccur. Any write request message for which the requestor does not haveread/write permission will be returned as a write error reply and nodirectory state updates of any kind will occur nor will the write databe written to memory. Table II shows an example of possibilities forlocal and remote access.

TABLE II Protection Value Local Access Allowed Remote Access Allowed 00Read/Write Nothing 01 Read/Write Read-only 10 Read/Write Read/Write 11Read-only Read-only

The memory protocol is implemented cooperatively by the home memorydirectories and the various remote entities including the processors andassociated processor interfaces, processor managed DMA mechanisms, andpeer IO nodes. The transient sharing state of coherence transactions atthe remote locations is maintained in small associative memories,coherent request buffers (CRB). Entities that have globally coherentcaches of system memory image also have internal state that is includedin the implementation of the coherence related protocol. For thesesituations, a CRB tracks the transient state of interactions between itand the processor cache hierarchies across the front side bus. Thecached memory hierarchy implements a MESI protocol identifying fourstable coherence states for each of the cache lines in the system. Theprocessor coherence states are shown in Table III.

TABLE III IA-64 Cache Line SN2 SN2 State Description name MnemonicInvalid not present in this cache hierarchy invalid INV Shared read-onlycopy of line present in shared SHD this cache hierarchy Exclusivewritable copy of line present in clean CEX this cache hierarchyexclusive Modified copy that is present is newer than dirty DEX the onein memory exclusive

There are major categories of transactions that are tracked remotely.These include locally initiated read request messages, locally initiatedwrite request messages, and incoming intervention requests.Interventions are received if the remote entity maintains a coherentlocally cached image of global memory. In some cases, it may beconvenient and efficient to manage separate CRBs for each category ofrequest. Otherwise, a single CRB structure may be sufficient.

Information that is tracked in a remote CRB includes an address field, astate field, a type field, a counter field, a doomed field, aspeculative reply field, and a NACK field. The address field includesthe system address of the request message. The state field includes thecurrent state of a transaction. If FREE, no transaction is being trackedwith this directory entry. The type field specifies the type of requestmessage. The counter field serves as a signed binary counter and is usedto count invalidate acknowledgments. The doomed field tracks whether acache line was invalidated while a read request message for it wasoutstanding. If the doomed field is set when the read response messagereturns, the read request message is retried. The speculative replyfield tracks which part of a speculative reply message has beenreceived. The NACK field counts how many times a request message hasbeen NACK'd. This value is used to implement the fairness/starvationmechanism and may be used to detect a request message that has beenexcessively NACK'd.

Other information that may be tracked includes additional information tofully characterize the current transaction so that it can be correctlyimplemented locally as in on the local front side bus or IO interfacewith its own protocol requirements. Information may be tracked relatingto local request messages or intervention request messages targeting thesame address as a currently pending transaction. Optimizations and errorhandling information may also be indicated. Table IV summarizesinformation that may be tracked in a remote CRB.

TABLE IV category Field Description A Address of the request S/Vtransient state (FREE, BUSY, Etc.) T Request type. C Invalidate ackcount (max value = max # of possible sharers in a system) Doomed. Set ifa read request is invalidated before the read data returns. ESpeculative reply tracking. NC NACK counter (in support of starvationavoidance) conflicting P Pending request type. Indicates whether localrequest a second request has been issued to the pending same address andneeds to be retried. conflicting H Held intervention type. interventionHS Pointer to intervention source node. request ECHO Echo field fromheld intervention pending message. auxiliary DID Deferred ID tag, aswhen IA-64 request info needed was first issued on the bus. to completeLEN size of data payload the transac- SHD Shared indication. Trackswhether tion locally another CPU on the bus had the line SHD or CEX.Determines whether read response can be placed in cache CEX or whetherit must he placed in cache SHD. optimizations, K pending speculativeread was satisfied error handling, locally before the response returnedetc. TO time out counter to identify hung transactions

Processor 16 can issue several classes of bus transactions. Table Vsummarizes the request phase transactions. Status presented in the snoopphase (not present, hit clean, or hit dirty) of a front side bustransaction is also processed as it indicates the lumped sharing stateof the requested cache line for all cache hierarchies on that front sidebus.

TABLE V Source group Name Description Proc SHub READ BRLD Bus Read128-byte cache line ✓ ✓ Line Data data fetch BRLC Bus Read 128-bytecache line ✓ Line Code fetch BRIL Bus Read Read request for an ✓ ✓ Lineand exclusive (i.e., Invalidate writable) copy of a cache line BRP BusRead Read 1-16 bytes from ✓ Partial a non-cached page. BRCL Bus Readprobe for and acquire ✓ Current snap shot of dirty Line line withoutchanging its state BIL Bus Invalidates a cache ✓ Invalidate line in allcaches on Line the bus. WRITE BWL Bus Write Write of 128 bytes of ✓ Linedata. Issued by a processor when evicting a dirty line from its cachehierarchy or when spilling a full line from its WC (write coalescing)buffers BCR Bus Cache Used to indicate that ✓ Line a processor hasReplacement dropped a clean- exclusive line, (also called relinquish:BRQSH) BWP Bus write Write of 1-64 bytes. ✓ partial Issued by aprocessor on a store to a non- cached page or when spilling a partiallyfilled WC buffer. MISC. INT Interrupt Issues an interrupt ✓ ✓ to aspecified processor. PTC Purge TC Requests a global ✓ ✓ translationcache (TLB) purge for a specified mapping from all processors on thisbus.Table VI shows examples of network request messages and Table VII showsnetwork reply messages for the memory protocol. All network messages areclassified as requests or replies. Each table specifies a message type,a mnemonic used to refer to the message type, a description of themessage, a payload of the message whether it is a cache line or otherpayload, a supplemental field for the message, a source for the message,and a destination for the message. The supplemental field may include apriority value for managing fairness/starvation, a byte mask fornon-coherent byte enabled writes, a payload length for non-coherentmulti-word writes, a pointer to a target node for backoff operations, aninvalidate acknowledgment count, a graphics credit return for flowcontrol, and a sharing vector for invalidate operations. The source anddestination are encoded as a directory at the home memory (D), aprocessor front side bus interface (P), a local 10 or system supportlogic (L), and a peer node (X).

TABLE VI Pay load Src Dest Group Name Description CL Other Suppl D P L XD P L X READ shared READ Read Priority ✓ ✓ RDSHD Read shared Priority ✓✓ exclusive RDEXC Read exclusive Priority ✓ ✓ ✓ ✓ RDXRO Read exclusiveread- Priority ✓ ✓ ✓ only, timed RDXRN Read exclusive read- Priority ✓ ✓✓ only, non-timed GET GET Read invalid Priority ✓ ✓ GETF Read invalid,forced Priority ✓ ✓ etc. AMOR Atomic memory ✓ ✓ operation, read NCRDNon-coherent read ✓ ✓ WRITE writeback WRBK Writeback ✓ ✓ ✓ WRBKRWriteback, concurrent ✓ ✓ ✓ read outstanding IWE Implicit writeback ✓ ✓✓ exclusive RQSH CEX drop (relinquish) ✓ ✓ RQSHR CEX drop, concurrent ✓✓ read outstanding PUT PUT Write invalidate ✓ Priority ✓ ✓ ✓ PFCL Cacheline flush Priority ✓ ✓ ✓ ✓ etc. AMOW Atomic memory ✓ ✓ ✓ operation,write NCWRD Non-coherent write, ✓ Mask ✓ ✓ doubleword NCWRF Non-coherentwrite, ✓ ✓ Length ✓ ✓ cache line Probe INTER Intervention shared ✓ ✓ ✓ ✓exclusive INEXC Intervention exclusive ✓ ✓ ✓ ✓ FLSH Flush ✓ ✓ ✓ ERASEEras ✓ ✓ ✓ ✓ GET ININV Intervention invalid ✓ ✓ ✓ ✓ ININF Interventioninvalid, ✓ ✓ ✓ ✓ forced etc. INVAL Invalidate ✓ ✓ ✓ ✓ ✓ ✓ INVAL BINEVBackoff invalidate Vector ✓ ✓ ✓ ✓ generation echo, vector format LINVVLocal block invalidate ✓ Vector ✓ ✓ vector

(Table VII—Continued on Pages 17 and 18)

TABLE VII Pay load Src Dest Group Name Description CL Other Suppl D P LX D P L X READ shared SRPLY Shared reply ✓ ✓ ✓ SRESP Shared response ✓ ✓✓ SACK Shared ✓ ✓ acknowledge BINTR Backoff ✓ Target ✓ ✓ interventionshared exclusive ERPLY Exclusive reply ✓ Ack ✓ ✓ ✓ ✓ Cnt ESPEC Exclusive✓ ✓ ✓ ✓ ✓ speculative reply ERESP Exclusive ✓ ✓ ✓ ✓ ✓ response EACKExclusive ✓ ✓ ✓ ✓ acknowledge ERPYP Exclusive ✓ Ack ✓ ✓ reply, send CntPRGE BIEXC Backoff ✓ Target ✓ ✓ intervention exclusive BINW BackoffVector ✓ ✓ ✓ ✓ ✓ invalidate, vector format BINVP Backoff Target ✓ ✓ ✓ ✓invalidate, pointer format GET IRPLY Invalid reply ✓ ✓ ✓ ✓ ✓ ISPECInvalid ✓ ✓ ✓ ✓ ✓ speculative reply IRESP Invalid ✓ ✓ ✓ ✓ ✓ responseIACK Invalid ✓ ✓ ✓ ✓ acknowledge NACKG Negative ✓ ✓ ✓ ✓ acknowledge toGET BIINV Backoff Target ✓ ✓ ✓ ✓ intervention invalid BIINF Backoff ✓Target ✓ ✓ ✓ ✓ intervention invalid forced etc. ARRP AMO read reply ✓ ✓✓ ✓ ✓ NCRP Non-coherent ✓ ✓ ✓ ✓ ✓ read reply NACK Coherent read ✓ ✓ ✓ ✓negative acknowledge WRITE writeback WBACK Writeback ✓ ✓ ✓ ✓ acknowledgeWBBAK Writeback busy ✓ ✓ acknowledge PUT WACK Write Ack ✓ ✓ ✓ ✓invalidate Cnt acknowledge WACKP Write Ack ✓ ✓ ✓ ✓ invalidate ack, Cntsend PRGE WRACK Write ✓ ✓ ✓ ✓ invalidate negative acknowledge BFLSHBackoff flush Target ✓ ✓ BERSE Backoff erase Target ✓ ✓ ✓ ✓ etc. AWAKAMO write ✓ ✓ ✓ ✓ acknowledge NCWAK Non-coherent ✓ ✓ ✓ write acknowledgePROBE shared SHWB Sharing ✓ ✓ ✓ ✓ writeback DNGRD Downgrade ✓ ✓ SHWBRSharing ✓ ✓ ✓ ✓ writeback, prior WB pending DNGDR Downgrade with ✓ ✓prior WB pending exclusive PRGE Purge ✓ ✓ XFER Ownership ✓ ✓ ✓ ✓transfer PRGER Purge with ✓ ✓ prior WB pending XFERR Ownership ✓ ✓ ✓ ✓transfer, prior WB pending IWACK Implicit ✓ ✓ writeback race acknowledgeGET IIACK Intervention ✓ ✓ invalid ack etc. IVACK Invalidate ack ✓ ✓ ✓ ✓✓ ✓ ERROR PERR Poisoned access ✓ ✓ ✓ error AERR Read protection ✓ ✓ ✓violation error WERR Write ✓ ✓ ✓ protection violation error DERRRDirectory error ✓ ✓ ✓ on a read request DERRW Directory error ✓ ✓ ✓ on awrite requestIncoming requests used by other nodes in system 10 to request data frommemory include RDEXC, RDSHD, and READ which are used by processors torequest coherent data in the exclusive, shared, or most convenientstate, respectively; RDXRO and RDXRN used by IO nodes to request a readonly copy without using the sharing vector; GET and GETF which are usedto request the current state of a cache line without keeping futurecoherence; NCRD which is used for a non-cached read of a double word;and AMOR which is used to request a special atomic memory read. Nodesreturn cache lines to memory by ROSH and RQSHR which are used to returnan exclusive line to memory which has not been modified and the dataitself is thus not returned; WRBK, WRBKR, and IWE which are used toreturn modified data to memory; PUT which is used by the IO system tooverwrite all copies of a cache line without regard to its previousstate; NCWRD and NCWRF which are used for non-cached writes ofdoublewords and cache lines; AMOW which is used to accomplish a specialatomic memory write; and PCFL which is used to flush a cache line andforce it out of all system caches.

Incoming replies are used to close out various transient states of thedirectory. They include XFER and XFERR which are used to return dirtydata to memory when another node is getting a clean exclusive copy;SHWBR which is used to return dirty data to memory when the sending nodeand another node will be sharing the cache line; DNGRD and DNGDR whichare used to notify the directory that the node now holds data sharedrather than clean exclusive; PRGE and PRGER which are used to notify thedirectory that the node no longer holds the cache line at all; IIACKwhich is used to notify the directory that the current value of a cacheline has been forwarded to a requestor who sent a GET; and IWACK whichis used to close out a particularly complex case in the protocolinvolving implicit writebacks.

Outgoing requests are used if outgoing request credits are available.These include INTER and INEXC which are used to request that anintervention be used to send a copy of the cache line to the requestorwho wants it in a shared or exclusive state; ININV and ININF which areused to request that a Memory Read Current be done and the resultspassed to the requestor who no longer wants a coherent copy; INVAL whichis used to request that a. node drop a clean copy of a cache line; LINVVwhich is used to request that the Local Block send some number ofinvalidates based on a copy of the sharing vector from the directoryentry; and FLSH and ERASE which are used to remove a cache line from anode with or without the return of any dirty data to the home memory.Outgoing backoff replies may be sent in place of outgoing requests ifthere is a potential for deadlock. These backoff replies are sent to theoriginal requestor who has space to store the needed action until it canbe accomplished. Outgoing backoff replies are sent when there are nooutgoing request credits available. They include BINTR, BIEXC, BIINV,BIINF, BINVP, BINVV, BFLSH, and BERSE.

Other outgoing replies involve returning data to a requestor. Theseinclude SRPLY, ERPLY, ERPYP, and IRPLY which return usable data to therequestor indicating different states; ESPEC and ISPEC which returnspeculative data to the requestor where there may or may not be a dirtycopy in the system which needs to supersede the speculative data (withthe requestor waiting to found out); NCRP which is used to returnnon-cached data; and ARRP which is used to return the results of anatomic read operation. Acknowledge writes include WRACK and WBBAK whichare used to acknowledge writebacks and communicate whether the nodeneeds to wait for a further message; WACK and WACKP which are used toacknowledge PUT and PFCL messages and indicate whether the sender needsto wait for INVAL or not; NCWAK which is used to acknowledge anon-cached write; and AWAK which is used to acknowledge an atomic memorywrite. Messages used to refuse acknowledgment of a request where therequestor must take appropriate action include NACK, NACKG, and WNACK.Error conditions are indicated by AERR, DERRR, DERRW, WERR, and PERR.

Table VIII and IX show the request and reply messages for the Programmedinput/output protocol. PIO reads and writes of both a single doublewordand a full cache line are supported.

TABLE VIII Pay load group Name Description CL Other Suppl Initial readPRDI PIO dword read Mask Requests PCRDI PIO cache line read write PWRIPIO dword write ✓ Mask PCWRI PIO cache line write ✓ Retry read PRIHA/BPIO dword read Mask Requests retry, head A/B (retry PRIRA/B PIO dwordread Mask requests retry, non-head A/B have two PCRHA/B PIO cache readflavors (A retry, head A/B and B) PCRRA/B PIO cache read which areretry, non-head A/B used to write PWIHA/B PIO dword write Mask guaranteeretry, head A/B forward PWIRA/B PIO dword write Mask progress) retry,non-head A/B PCWHA/B PIO cache write retry, head A/B PCWIA/B PIO cachewrite retry, non-head A/B

TABLE IX Payload group Name Description CL other Suppl ACK PRPLY PIOdword read ✓ responses reply PCRPY PIO cache line ✓ read reply PACKN PIOdword write ack, normal mode PACKH PIO dword write ack, head mode PCAKNPIO cache line write ack, normal mode PCAKH PIO cache line write ack,head mode MACK PNKRA/B PIO dword read responses NACK, queue A/B PCNRA/BPIO cache line read NACK, queue PNKWA/B PIO dword write MACK, queue A/BPCNWA/B PIO cache line write MACK, queue A/B Error PCNWA PIO read errorresponses PWERR PIO write error PSDBK PIO TLB shootdown deadlock break

Table X shows the request and reply messages for the graphics flowcontrol protocol. This protocol provides the means by which uncachedwrites to a graphics region of the physical address space aretransferred to a graphics device. A graphics write is received from thefront side bus and forwarded to the proper destination. As the graphicsdevice consumes data, credits are returned to the originating node topermit additional graphics writes to be sent.

TABLE X Name Description Payload Suppl GFXW1 Graphics dword write DWGFXWC Graphics cache line write CL GFXCR Graphics credit Credits GFXERGraphics write error

TABLE XI shows the request and reply messages for the administrativeprotocol. The administrative protocol supports several types of messagesthat act on the router itself rather than simply being passed throughthe router. These messages include vector operations to read and routeinternal router state and additional messages used in implementing thehardware barrier tree mechanism. Other messages facilitate interrupt andTLB shootdown distribution.

TABLE XI Name Description Payload SuppI VRD explicitly routed (vector)read ✓ VWR Vector write ✓ BAR Vector barrier ✓ LINTR Local interrupt(Normally never ✓ appears on the network but error interrupts onheadless nodes are directed off-node) LPTC Local TLB shootdown ✓ VRPLYVector read reply ✓ VWACK Vector write ack ✓ VERRA Vector address error✓ VERRC Vector command error ✓ VERAC Vector address/command error ✓

Despite the many message types and transient states to track andresolve, the protocol scheme follows a basic function to handle initialrequest messages. In general, processors and input/output agents issuecoherent read and write request messages to memory. How a particularread and write request message is processed is determined by thedirectory state when the initial request message reaches the directory.The memory will service each individual request message according to oneof several generalized procedures. Memory may respond to a requestmessage through a direct reply wherein a read data or write acknowledgereply is sent to the message requestor if the cache line is in a standbystate or by NACKing the request message if the cache line is in atransient state. The memory may also return a preliminary reply andissue an intervention request, an invalidate request, or a backoffresponse. The intervention request is sent to the current owner of thecache line. The invalidate request is sent to the current owner of thecache line and shares thereof. The backoff response is sent to therequestor in order to have the requestor issue the intervention orinvalidate requests on its own. The subsequent messages issued by thememory will eventually produce another reply message which is forwardedto the requestor advising of the final disposition of the requestmessage.

Coherent read request messages include a shared read that obtains aread-only copy of a cache line for which other read-only copies mayexist elsewhere in the system. The read-only copy is persistent in thatthe memory system tracks all sharers so that it may invalidate theircopies if the cache line is subsequently modified. An exclusive read isa read and writable copy of a cache line for which no other copy isallowed to exist except for the one in main resident memory. Memory willretrieve the cache line from an exclusive owner if some other entitydesires a coherent copy of it. A get read obtains a momentarily coherentread-only copy of a cache line. The memory system does not include therequester in the sharer tracking process and essentially forgets aboutthe copy obtained in this manner.

Coherent write request messages may be a writeback of exclusively heldcache resident cache lines to memory. An explicit writeback occurs whena dirty exclusive (DEX) line in a processor cache is evicted to makeroom for a new cache line from another memory address. A relinquishwriteback is similar to an explicit writeback except that the cache lineis still clean (CEX) so no data is actually returned to memory. Animplicit writeback occurs as a result of a probe to a dirty cache lineon the owner's front side bus either by another processor on that frontside bus or as part of an intervention issued on behalf of the memorysystem. A coherent write request message may also be a put write messagethat writes full cache lines of data directly to memory rather than byobtaining an exclusive copy of a cache line and modifying it remotelybefore returning it to memory. As a result, all remote copies of atargeted cache line are invalidated.

Request messages that query the processor cache hierarchy on a frontside bus are called probes. A probe may include an invalidate request oran intervention request. An invalidate request will expunge sharedcopies of a cache line if it is still present in one or more of thecaches on the front side bus. An intervention request will retrieve theup to date value of an exclusively held and possibly modified cache linein one of the caches on the target front side bus. A probe ultimatelyresults in one or more additional reply messages sent back to theoriginal requestor and a separate reply message sent back to thedirectory. If memory cannot safely issue a probe without risking achance of deadlock, it will issue a backoff response message to therequestor instead of directly sending the probe. The backoff responsemessage tells the requestor to initiate the probe on its own. Subsequentprotocol procedures at the directory and elsewhere are essentiallyunchanged regardless of who issues the probe.

Table XII shows examples of coherent request messages that a directorymay receive and the initial and secondary actions that may be taken inresponse to the request messages. Backoff responses and secondarytransient states are not shown. Replies from the directory target therequestor and probes target the current owner or sharers of record.Probe responses are generally returned to the directory by the currentowner. Invalidate probes do not produce probe responses to the directoryexcept for a write invalidate message (PUT or PFCL) and read exclusiveread-only request messages (RDXRN or RDXRO). In these cases, the proberesponse is a PRGE from the original requestor rather than from thecurrent owner.

(Table XII—continued on Page 25)

TABLE XII Current Actions Primary Final Request Line Reply Probe VectorTransient Probe Line Type State Type AckCnt Request Action StateResponse State READ UNOWN ERPLY 0 pointer EXCL SHRD SPRLY add SHRD EXCLESPEC INTER pointer BUSY DNGRD SHRD SHWB PRGE EXCL XFER SXRO SXRO (Exp)all others RDSHD UNOWN SRPLY new SHRD (same as SHRD SPRLY add SHRD READEXCL ESPEC INTER pointer BUSY DNGRD SHRD except SHWB SXROT) PRGE EXCLXFER SXRO ERPLY 1 INVAL new SHRD SXRO SRPLY pointer SHRD (Exp) all NACKn/c others RDEXC UNOWN ERPLY 0 pointer EXCL SHRD EPRLY # INVAL (s)pointer shares EXCL ESPEC INEXC pointer BUSY PRGE XFER SXRO ERPLY 1INVAL pointer SXRO ERPLY 0 (Exp) all NACK n/c others RDXRO UNOWN ERPLY 0SXRO SHRD EPRLY # INVAL (s) pointer BSYX PRGE shares EXCL ESPEC INEXCpointer XFER PRGE SXRO ERPLY 1 INVAL pointer PRGE SXRO ERPLY 0 pointer(Exp) all NACK n/c others RDXRN UNOWN ERPLY 0 SXRO SHRD EPRLY # INVAL(s) pointer BSYN PRGE shares EXCL ESPEC INEXC pointer XFER PRGE SXROERPLY 1 INVAL pointer PRGE SXRO ERPLY 0 pointer (Exp) all NACK n/cothers GET UNOWN IRPLY n/c UNOWN SHRD IRPLY n/c SHRD EXCL none ININV n/cBUSYI IIACK EXCL SXRO IRPLY n/c SXRO SXRO(Exp) IRPLY n/c SXRO (Exp) allNACK n/c all others others GETF UNOWN IRPLY UNOWN (same as SHRD IRPLYn/c SHRD GET EXCL ISPEC ININF new? BSYG DNGRD SHRD except n/c PRGE UNOWNEXCL n/c XFER UNOWN case) SXRO IRPLY n/c SXRO SXRO IRPLY n/c SXRO (Exp)(Exp) all NACK n/c n/c others PUT UNOWN WACK 0 UNOWN SHRD WACKP #INVAL(s) BSYF PRGE shares EXCL none ERASE SXRO WACKP 1 INVAL SXRO WACK(Exp) all WNACK n/c others PFCL UNOWN WACK 0 UNOWN (same as SHRD WACKP #INVAL(s) BSYF PRGE PUT shares except EXCL none FLSH XFER EXCL SXRO WACKP1 INVAL PRGE case) SXRO WACK (Exp) all WNACK n/c others WRBK EXCL¹ WBACKUNOWN WRBKR RQSH RQSHR IWE EXCL

Writebacks (WRBK, WRBKR, RQSH, RQSHR, and IWE) should never hit a linein SHRD, SXRO or UNOWN. Writebacks to any transient state line (BUSY,etc.) represent protocol races. These are not nacked as all otherrequests would be because the information needed to fully process therequest is implicit in the request itself. However, the proceeding alsodepends on current and pending ownership and the specific type oftransient state encountered. In general, the Reply to a Writebackrequest in this case is either a normal WBACK or a WBBAK (Writeback BusyAcknowledge)

Processor 16 defines a slightly different set of state transitions inresponse to interventions than was used in other processors such as theR10000. Table XIII shows the state transitions for processor 16 ascompared to other processors such as the R10000. The main difference isin the handling of a shared intervention (BRL) that targets a cache linein a dirty exclusive (M) state. The M to I transition on a BRL differsfrom traditional handling of shared interventions. This difference,though seemingly minor, has a significant impact on the directory statetransitions that occur in the course of handling an intervention. Thecomplication occurs in that the directory does not know the ultimatestate of the cache line in the old owner's cache until the interventionis issued and the snoop result observed. Further complicating matters isthe possibility that a writeback (WRBK), relinquish (RQSH), or implicitwriteback (IWE) will be outstanding when the intervention arrives.

TABLE XIII Current New Cache New Cache Intervention Cache State, OtherState, Type State Processors Processor 16 Shared (BRL) DEX (M) SHD (S)INV (I) CEX (E) SHD (S) SHD (S) SHD (S) SHD (S) SHD (S) INV (I) INV (I)INV (I) Exclusive (BRIL,) DEX (M) INV (I) INV (I) CEX (E) INV (I) INV(I) SHD (S) INV (I) INV (I) INV (I) INV (I) INV (I)

The following is an example of intervention handling. When there is nowrite request message outstanding (no WRBK, RQSH, or IWE), an IRB entryin processor interface 24 is allocated and an intervention is issued onthe front side bus. A BRL is issued for INTER and ININF probes. A BRILis issued for INEXC and FLSH probes. A BIL is issued for an ERASE probe.A BRCL is issued for an ININV probe. Once the intervention has issued,the IRB awaits the snoop result to determine the state of the cache linein the processor cache. Processing of the intervention varies accordingto the snoop result. If the cache line was in the M state (HITM assertedin the snoop phase), the old owner will not retain the cache line atall. The requestor takes the cache line as clean exclusive (CEX). Thefinal directory state becomes EXCL with the requestor as the owner. Theold owner sends an ownership transfer (XFER) message to the directoryand, if the intervention was not a FLSH or ERASE, sends an ERESP messageto the requestor. An IRESP message is sent if the intervention was anININF. If the cache line was in the E or S states (HIT asserted in thesnoop phase), the old owner will retain a shared copy of the cache line.The requestor takes the cache line as shared (SHD). The final directorystate of the cache line will be SHRD with both the old owner andrequestor as sharers. The old owner will send a downgrade (DNGRD)message to the directory and, if the intervention was not a FLSH orERASE, sends an SACK message to the requestor. An IACK message is sentif the intervention was an ININF. If the cache line was in the I state(neither HIT nor HITM asserted in the snoop phase), the old owner willnot retain the cache line at all and the requestor takes the cache lineEXCL as in the M state case above. This case occurs when the old owneroriginally obtained the cache line CEX and dropped it without issuing arelinquish request message. The old owner will send a purge (PRGE)message to the directory and, if the intervention was not a FLSH orERASE, sends an EACK message to the requestor. An IACK message is sentif the intervention was ININF.

Different processing is needed to handle an intervention that arriveswhen a write request message is outstanding. Processing of theintervention on what types of write request messages are outstanding.There may be more than one type outstanding as the WRB entry inprocessor interface 24 can hold two write requests, one that has beensent into the network (the WRB T field) and a second that is pending(the WRB P field). Table XIV shows the intervention processingpossibilities when a write request message is outstanding. The firstline of Table XIV shows the case discussed above with no write requestmessage outstanding. If there is a writeback or relinquish outstanding,no intervention needs to be issued because the presence of the writebackor relinquish indicates that the processor no longer holds the cacheline. In the WRBK and WRBKR cases, the data is forwarded from the WRBdata buffer to the requestor as part of the ERESP message. In the RQSHand RQSHR cases, no data is available and thus only an EACK messageneeds to be sent. The WRB P field is none in these cases as theprocessor does not generate further write requests once it has issued awriteback or relinquish message.

TABLE XIV Issue WRB T WRB P Intervention Message to Message to FieldField on FSB? Directory Requester none none Yes (Per Simple (Per SimpleIntervention) Intervention) BWL none No none ERESP BWLR none No PRGERERESP BRQSH none No none BACK BRQHR none No PRGER SACK BIWE none Yes(See (See discussion discussion below) below) BIWE Yes (See (Seediscussion discussion below) below) BRQSH No PRGER ERESP BRQHR No PRGERERESP BWL No XFERR ERESP BWLR No XFERR ERESP

The “I” versions of the messages are sent if the intervention was anININF. That is, an IRESP instead of an ERESP and an TACK instead of anEACK. Also, the WRBKR case has further complications that result from apossible race between a WRBKR and a PUT message. These complicationsrequire that the message to the requestor be delayed until the old ownerreceives either a WBACK or WBBAK. Depending on whether a WBACK or WBBAKis received, the old owner sends either an ERESP or an EACK to therequester.

Complications occur when there is an implicit writeback (IWE)outstanding in the network. The IWE data in the WRB data buffer may ormay not be the most up to date copy of the cache line. If the WRB Pfield indicates a writeback or relinquish message, then the WRB data isup to date and forwarded to the requestor in an ERESP message. If nowrite request is pending or if there is a second IWE pending, theintervention is issued on the front side bus to determine whether theprocessor has modified the cache line since issuing the initial IWE. Ifthe snoop result is HITM, the data from the front side bus is forwardedto the requestor and the directory in the same manner as the M statediscussed above. If the snoop result is HIT or neither HIT nor HITM,then the data in the WRB data buffer is current and forwarded to therequestor as either an ERESP or SRESP message depending on theintervention type. The data is sent to the directory as either a SHWB orXFER depending on the intervention type. The WRB data is not forwardedto the directory if the WRB P field is NONE since the IWE alreadyoutstanding in the network contains the up to date copy of the cacheline. In this case, a PRGER message is sent instead.

Implicit writebacks (IWE) are generated when a processor issues a BRL orBRIL and the HITM signal is asserted in the snoop phase indicating thatanother processor on the bus holds the cache line in a DEX state andwill supply the data to the requesting processor. Since the processorasserting HITM is relinquishing ownership of a modified cache line andthe requesting processor is not guaranteed to place the cache line inits cache in a DEX state, the cache line could be dropped from allprocessors on the bus and its contents lost upon a cache to cachetransfer. Thus, at the same time the processor asserting HITM istransferring the cache line to the requesting processor, the cache lineis read and written back to memory. This writing back to memory in thisinstance is an implicit writeback. Three implicit writeback cases arediscussed below.

When a requesting processor issues a BRL, the cache line is loaded intothe requesting processor's cache in the CEX state and dropped from theowning processor's cache. An implicit writeback message is generated inthis instance. The IWE message includes the latest copy of the cacheline and indicates that the cache line is being retained in the CEXstate by the originator of the IWE message. Since the cache line is nowin the CEX state, the new owning processor can write to the cache lineand update its state to DEX at any time. If such a write occurs and thestate becomes DEX and another processor on the bus issues a ERL, theimplicit writeback case will once again arise. This situation may repeatindefinitely, thereby generating an unbounded number of implicitwritebacks. When a requesting processor issues a BRIL with OWN# notasserted, the cache line is loaded in the CEX state into the requestingprocessor and is dropped from the cache of the owning processor similarto the BRL case above. When a requesting processor issues a BRIL withOWN# asserted, the requesting processor indicates that it will place theline in its cache in the DEX state rather than the CEX state. Animplicit writeback is not required as the requesting processor cannotdrop the cache line without first issuing a normal writeback.

Ordinarily, the most up to date copy of a cache line is in one of twoplaces - the cache of the owning processor or main memory. Obtaining thelatest copy of a cache line is simply performed by sending anintervention to the owner. If the intervention retrieves the cache linewith state DEX, then the cache line is the latest copy. If the state ofthe cache line is not DEX, the cache line was dropped or is beingwritten back and the directory will receive the latest copy when thewriteback arrives. As a cache line can be written back once, bydefinition the latest copy of the cache line is received when thewriteback arrives. However, implicit writebacks considerably complicatefinding the latest copy of a cache line. The problem lies in that theimplicit writeback may or may not have the latest copy of the cacheline. Only by issuing an intervention can the latest copy of the cacheline be discovered. If the intervention finds the cache line in a DEXstate, then that is the latest copy. If the cache line has been dropped,then the implicit writeback has the most up to date copy of the cacheline. However, the processor can issue multiple implicit writebacks. Ifthe cache line is not in the processor's cache, the protocol schemeneeds to ensure that data is retrieved from the most recently issuedimplicit writeback which may or may not be the one that is in flight inthe network or has just been received at the directory.

FIG. 3 shows an example to alleviate the problem of multiple implicitwritebacks flowing through system 10. In FIG. 3, a processor 100 hasobtained a copy of a cache line and sends an implicit writeback. Theimplicit writeback is processed by the front side bus processorinterface 24 and sent to the appropriate memory directory interface unit22 associated with the memory 17 which is the home for the cache line.Upon processing the implicit writeback, memory directory interface unit22 returns a writeback ACK. Front side bus processor interface 24receives the writeback ACK to indicate that memory 17 has the same copyof the cache line as processor 100. If changes to the cache line aremade by processor 100, it will initiate another writeback, either anormal writeback or an implicit writeback, for each change made to thecache line. Also, ownership of the cache line may pass back and forthbetween co-located processors 101 in a node, each initiating an implicitor normal writebacks. Instead of processing each and every writebackinitiated by processor 100, front side bus processor interface 24 willmaintain the most recent writeback request in a queue 102. For eachimplicit or normal writeback request received at its queue, front sidebus processor interface 24 will discard the previous writeback request.Once front side bus processor interface 24 receives the writeback ACKfrom memory directory interface unit 22 for the initial implicitwriteback, the current writeback request if any in the queue istransferred to memory directory interface unit 22 for processing and theprocess repeats. If the current writeback request in the queue is animplicit writeback, then the process is repeated. If the currentwriteback request in the queue is a normal writeback, then anysubsequent writebacks are processed in the order they are received. Oncean implicit writeback is reached, the above process may be repeated.

FIG. 3 also shows the events that occur when a remote processor seeksaccess to the cache line prior to processing of the implicit writeback.After processor 100 initiates an implicit writeback to front side busprocessor interface 24, a remote processor 200 initiates a read requestto memory directory interface unit 22. Memory directory interface unit22 initiates an intervention for transfer to front side bus processorinterface 24 since it thinks that processor 100 is the current owner ofthe cache line. Memory directory interface unit 22 will also send aspeculative response to remote processor 200 since it thinks it has thelatest copy of the cache line. Front side bus processor interface 24receives the intervention but knows it has an implicit writeback toprocess. The intervention is placed on hold and the implicit writebackis sent to memory directory interface unit 22. Upon processing theimplicit writeback, memory directory interface unit 22 sends thewriteback ACK. Front side bus processor interface 22 receives thewriteback ACK and determines if there is a pending writeback in itsqueue 102. If so, front side bus processor interface 24 sends out thepending writeback to memory directory interface unit 24 and also sendsout a response to remote processor 200 since it has the latest copy ofthe cache line. In this manner, the latest copy of the cache line may beprovided for read requests while a writeback is pending.

FIG. 4 shows an example of the transfer of ownership of a. cache lineduring a pending writeback. A cache coherence protocol that is basedupon supporting nodes with snoopy processor buses that generate implicitwriteback operations can cause delay in the transition of ownership to anode/processor if another node/processor already has exclusive ownershipand is in the process of writing modified data back to memory. Thetransfer of ownership provided in FIG. 4 does not rely on the completionof a write to memory from the former owner of a cache line beforeallowing a new owner to gain exclusive ownership of that cache line. Aprocessor 300 has a modified cache line and initiates either a normal orimplicit writeback to front side bus processor interface 24. Prior totransfer of the writeback to memory directory interface unit 22, aremote processor 400 initiates a read request. Memory directoryinterface unit 22 generates an intervention message in response to theread request and receives the writeback from front side bus processorinterface 24. Front side bus processor interface 24 receives theintervention message and, before receiving a writeback ACK from memorydirectory interface unit 22, sends a response to the interventionmessage to remote processor 400 that includes the cache line requestedby remote processor 400. Remote processor 400 now has ownership of thecache line and can modify it or drop it as desired. If remote processor400 drops the cache line, the cache line is not lost as the writebackfrom processor 300 is still pending to preserve the cache line inmemory. If remote processor 400 modifies the cache line, a writeback issent to memory directory interface unit 22 from remote processor 400. Ifthe initial writeback is received at memory directory interface unit 22first, then it will be processed followed by the writeback from remoteprocessor 400 in a normal manner. If the writeback from remote processor400 is received first, then memory directory interface unit 22 processesit and updates the cache line data in memory. Upon receiving thewriteback from processor 300, memory directory interface 22 will notupdate the cache line data for this writeback.

In some circumstances, a processor may obtain ownership of a cache lineand not make any changes to the cache line. The processor may just dropthe cache line if it no longer needs it. If the processor drops thecache line, the rest of the system does not become aware of the droppingof the cache line and interventions for the cache line will continue tobe sent to the processor. To avoid processing of interventions in thisscenario, the processor is programmed to send out a relinquish messageto let the system know that it is giving up ownership of the cache line.Thus, only those interventions need be processed that were initiatedprior to processing of the relinquish message at memory directoryinterface unit 22. A relinquish message is processed as a data lesswriteback since it is not modifying the cache line in memory as thememory has the current copy of the cache line due to no changes beingmade to the cache line at the processor. Once the relinquish command hasbeen processed, memory directory interface unit 22 can directly handle aread request without initiating an intervention to the processor thatgave up ownership of the cache line.

FIG. 5 shows how memory latency can be reduced during read requests.System 10 is a distributed shared memory system with nodes based onsnoopy processor buses. When processor 500 makes a read request, a snoopoperation is performed at a colocated processor 600 on the local bus.Before the snoop operation is completed, the read request is forwardedfro front side bus processor interface 22 to a local or remote memorydirectory interface unit 24 for processing. If the snoop operationdetermines that the cache line needed is held in colocated processor 600by indicating a processor hit and the data being modified, the data isprovided to processor 500 by colocated processor 600 over the local busfor its use. Memory directory interface unit 24 processes the readrequest and forwards a response to front side bus processor interface24. Front side bus processor interface 24 sees that the snoop operationsatisfied the read request and subsequently discards or ignores theresponse from memory directory interface unit 22.

If the snoop operation determines that the cache line is not availablelocally, then the cache line is obtained by processor 500 through normalprocessing of the read request. Memory directory interface unit 22obtains the cache line from memory or fetches the cache line from aremote processor 605 if it has a modified version of the cache line. Ifprocessor 500 obtains the data from processor 600, processor 500 mayplace a writeback request to update the home memory for the data. Thewriteback request includes an indication that there is an outstandingread request in the system. In case the writeback request is received atmemory interface an outstanding read request in the system. In case thewriteback request is received at memory interface unit 22 prior to theread request, the writeback request provides the necessary indication tomemory directory interface unit that the read request is not to beprocessed.

FIG. 6 shows how cache flushes can be performed in system 10.Conventionally, a request to flush a cache in a local bus systemprovides a mechanism to have the memory maintain the only copy of acache line with no processor maintaining a copy of the cache line. Thelocal bus system is not aware of the other processors on other localbuses having a copy of the flushed cache line in an implementation suchas system 10. The technique of FIG. 6 extends the local bus system flushcapability to the distributed shared memory multiprocessor computersystem of system 10. A processor 600 initiates a flush request for aparticular cache line. Processor interface 24 receives the flush requestand performs a snoop operation to determine whether the cache line ismaintained in any local processor and then whether the cache line hasbeen modified. If the snoop result is that the cache line is maintainedlocally and has been modified, processor interface 22 initiates removalof the cache line from the cache of the identified processor. Theidentified processor initiates a writeback for transfer to memorydirectory interface unit 22 associated with the home memory 17 for thedata in order to preserve its modifications.

If the snoop result is that the cache line is not maintained locally orthe cache line has not been modified, processor interface 24 forwardsthe flush request to memory directory interface unit 24 associated withhome memory 17 of the cache line. The local processors having anunmodified copy of the cache line may be flushed of the cache line atthis point. Memory directory interface unit 22 determines whichprocessors in system 10 maintain a copy of the cache line. The flushrequest is then forwarded to the identified processors for appropriateaction. If an identified processor has a modified copy of the cacheline, it removes the modified copy from its cache and forwards themodified copy in a writeback request to memory directory interface unit24 for memory 17 update.

Thus, it is apparent that there has been provided, in accordance withthe present invention, a system and method for removing data fromprocessor caches in a distributed multi-processor computer system thatsatisfy the advantages set forth above. Although the present inventionhas been described in detail it should be understood that variouschanges, substitutions, and alterations may be made herein. For example,though shown as individual protocols schemes, different combinations ofmessage processing may be performed according to the protocol scheme.Other examples may be readily ascertainable by those skilled in the artand may be made herein without departing from the spirit and scope ofthe present invention as defined by the following claims.

1. (canceled)
 2. A method for maintaining cache line information, themethod comprising: storing directory state information for a cache line,wherein the directory state information for the cache line is associatedwith a directory state priority and a directory protection value for thecache line; receiving a request relating to the cache line from arequestor; extracting a read request priority from the received readrequest; comparing the extracted read request priority with thedirectory state priority; identifying that the request can be evaluatedfor further processing based on the comparing of the read requestpriority with the directory state priority, wherein the furtherprocessing of the request is contingent upon a protection levelassociated with the request and the directory protection value for thecache line; identifying the protection level associated with therequest; and identifying that the requestor has permission to access thecache line based on the protection level associated with the request andthe directory protection value for the cache line, wherein the requestis performed in accordance to the directory protection value for thecache line and the protection level associated with the request.
 3. Themethod of claim 2, wherein: a first protection value of a plurality ofprotection values identifies that local read and local write accesses tothe cache line are authorized and that remote read accesses and remotewrite accesses to the cache line are not authorized, a second protectionvalue of the plurality of protection values identifies that the localread and the local write accesses to the cache line are authorized andthat remote read only accesses to the cache line are authorized, a thirdprotection value of the plurality of protection values identifies thatthe local read and the local write accesses to the cache line areauthorized and that the remote read accesses and remote write accessesto the cache line are authorized, and a fourth protection value of theplurality of protection values identifies that local read only accessesto the cache line are authorized and that the remote read only accessesto the cache line are authorized.
 4. The method of claim 1, furthercomprising: maintaining information regarding a state of the cache line,wherein the information includes address information, counterinformation, and negative acknowledge information; and updating thestate of the cache line based on the identifying that the requestor hasthe permission to access the cache line.
 5. The method of claim 4,wherein the updated state of the of the cache line corresponds to ashared state.
 6. The method of claim 4, wherein the updated state of thecache line corresponds to an exclusive state.
 7. The method of claim 1,wherein the request was serviced successfully, and the directory statepriority is reset to a zero priority after the request was servicedsuccessfully.
 8. The method of claim 1, wherein the request was receivedover a first virtual channel and a reply is sent to the requestor whenthe request is performed over a second virtual channel.
 9. Anon-transitory computer readable storage medium having embodied thereona program executable by a processor for performing a method formaintaining cache line information, the method comprising: storingdirectory state information for a cache line, wherein the directorystate information for the cache line is associated with a directorystate priority and a directory protection value for the cache line;receiving a request relating to the cache line from a requestor;extracting a read request priority from the received read request;comparing the extracted read request priority with the directory statepriority; identifying that the request can be evaluated for furtherprocessing based on the comparing of the read request priority with thedirectory state priority, wherein the further processing of the requestis contingent upon a protection level associated with the request andthe directory protection value for the cache line; identifying theprotection level associated with the request; and identifying that therequestor has permission to access the cache line based on theprotection level associated with the request and the directoryprotection value for the cache line, wherein the request is performed inaccordance to the directory protection value for the cache line and theprotection level associated with the request.
 10. The non-transitorycomputer readable storage medium of claim 9, wherein: a first protectionvalue of a plurality of protection values identifies that local read andlocal write accesses to the cache line are authorized and that remoteread accesses and remote write accesses to the cache line are notauthorized, a second protection value of the plurality of protectionvalues identifies that the local read and the local write accesses tothe cache line are authorized and that remote read only accesses to thecache line are authorized, a third protection value of the plurality ofprotection values identifies that the local read and the local writeaccesses to the cache line are authorized and that the remote readaccesses and remote write accesses to the cache line are authorized, anda fourth protection value of the plurality of protection valuesidentifies that local read only accesses to the cache line areauthorized and that the remote read only accesses to the cache line areauthorized.
 11. The non-transitory computer readable storage medium ofclaim 9, further comprising: maintaining information regarding a stateof the cache line, wherein the information includes address information,state information, counter information, and negative acknowledgeinformation; and updating the state of the cache line based on theidentifying that the requestor has the permission to access the cacheline.
 12. The non-transitory computer readable storage medium of claim11, wherein the updated state of the of the cache line corresponds to ashared state.
 13. The non-transitory computer readable storage medium ofclaim 11, wherein the updated state of the cache line corresponds to anexclusive state.
 14. The non-transitory computer readable storage mediumof claim 9, wherein the request was serviced successfully, and thedirectory state priority is reset to a zero priority after the requestwas serviced successfully.
 15. The non-transitory computer readablestorage medium of claim 9, wherein he request was received over a firstvirtual channel, a reply is sent to the requestor when the request isperformed, and the reply was sent over a second virtual channel.
 16. Asystem for maintaining cache line information, the system comprising: aplurality of computing nodes, wherein each computing node includes oneor more processors, and each processor of the one or more processorsexecutes instructions out of a memory of one or more memoires; aplurality of cache memories, wherein each cache memory of the pluralityof cache memories stores a portion of data of a shared coherent data; acomputer network that interconnects each of the plurality of computingnodes, wherein at least one computing node of the plurality of computingnodes: stores directory state information for a cache line, wherein thedirectory state information for the cache line is associated with adirectory state priority and a directory protection value for the cacheline, receives a request relating to the cache line from a requestor,extracts a read request priority from the received read request,compares the extracted read request priority with the directory statepriority, identifies that the request can be evaluated for furtherprocessing based on the comparing of the read request priority with thedirectory state priority, the further processing of the request iscontingent upon a protection level associated with the request and thedirectory protection value for the cache line, identifies the protectionlevel associated with the request, identifies that the requestor haspermission to access the cache line based on the protection levelassociated with the request and the directory protection value for thecache line, and the request is performed in accordance to the directoryprotection value for the cache line and the protection level associatedwith the request.
 17. The system of claim 16, wherein: a firstprotection value of a plurality of protection values identifies thatlocal read and local write accesses to the cache line are authorized andthat remote read accesses and remote write accesses to the cache lineare not authorized, a second protection value of the plurality ofprotection values identifies that the local read and the local writeaccesses to the cache line are authorized and that remote read onlyaccesses to the cache line are authorized, a third protection value ofthe plurality of protection values identifies that the local read andthe local write accesses to the cache line are authorized and that theremote read accesses and remote write accesses to the cache line areauthorized, and a fourth protection value of the plurality of protectionvalues identifies that local read only accesses to the cache line areauthorized and that the remote read only accesses to the cache line areauthorized.
 18. The system of claim 16, wherein: information ismaintained regarding a state of the cache line, the information includesaddress information, counter information, and negative acknowledgeinformation, and the state of the cache line is updated based on theidentifying that the requestor has the permission to access the cacheline.
 19. The system of claim 18, wherein the updated state of the ofthe cache line corresponds to a shared state.
 20. The system of claim18, wherein the updated state of the cache line corresponds to anexclusive state.
 21. The system of claim 16, wherein the request wasserviced successfully, and the directory state priority is reset to azero priority after the request was serviced successfully.